Dataset statistics
| Number of variables | 12 |
|---|---|
| Number of observations | 39948 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 22 |
| Duplicate rows (%) | 0.1% |
| Total size in memory | 3.7 MiB |
| Average record size in memory | 96.0 B |
Variable types
| NUM | 9 |
|---|---|
| CAT | 3 |
Reproduction
| Analysis started | 2021-03-11 01:58:56.458074 |
|---|---|
| Analysis finished | 2021-03-11 01:59:13.133202 |
| Duration | 16.68 seconds |
| Version | pandas-profiling v2.8.0 |
| Command line | pandas_profiling --config_file config.yaml [YOUR_FILE.csv] |
| Download configuration | config.yaml |
| Dataset has 22 (0.1%) duplicate rows | Duplicates |
impression is highly skewed (γ1 = 159.0616115) | Skewed |
query_id has 571 (1.4%) zeros | Zeros |
keyword_id has 570 (1.4%) zeros | Zeros |
user_id has 9633 (24.1%) zeros | Zeros |
click
Categorical
| Distinct count | 2 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 312.1 KiB |
| 0 | |
|---|---|
| 1 |
| Value | Count | Frequency (%) | |
| 0 | 33220 | 83.2% | |
| 1 | 6728 | 16.8% |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
| Distinct count | 99 |
|---|---|
| Unique (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.100205266846901 |
|---|---|
| Minimum | 1 |
| Maximum | 11820 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 312.1 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 1 |
| Q3 | 1 |
| 95-th percentile | 3 |
| Maximum | 11820 |
| Range | 11819 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 65.86738287 |
|---|---|
| Coefficient of variation (CV) | 31.36235486 |
| Kurtosis | 27130.68877 |
| Mean | 2.100205267 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 159.0616115 |
| Sum | 83899 |
| Variance | 4338.512126 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 1 | 33846 | 84.7% | |
| 2 | 3670 | 9.2% | |
| 3 | 1032 | 2.6% | |
| 4 | 444 | 1.1% | |
| 5 | 222 | 0.6% | |
| 6 | 158 | 0.4% | |
| 7 | 105 | 0.3% | |
| 8 | 78 | 0.2% | |
| 10 | 38 | 0.1% | |
| 9 | 33 | 0.1% | |
| Other values (89) | 322 | 0.8% |
| Value | Count | Frequency (%) | |
| 1 | 33846 | 84.7% | |
| 2 | 3670 | 9.2% | |
| 3 | 1032 | 2.6% | |
| 4 | 444 | 1.1% | |
| 5 | 222 | 0.6% |
| Value | Count | Frequency (%) | |
| 11820 | 1 | < 0.1% | |
| 5467 | 1 | < 0.1% | |
| 1009 | 1 | < 0.1% | |
| 584 | 1 | < 0.1% | |
| 490 | 1 | < 0.1% |
url_hash
Real number (ℝ≥0)
| Distinct count | 6941 |
|---|---|
| Unique (%) | 17.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 9.641350419145736e+18 |
|---|---|
| Minimum | 482436910553333.0 |
| Maximum | 1.844094316957687e+19 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 312.1 KiB |
Quantile statistics
| Minimum | 4.824369106e+14 |
|---|---|
| 5-th percentile | 1.33593611e+18 |
| Q1 | 5.468727571e+18 |
| median | 1.034946865e+19 |
| Q3 | 1.434039016e+19 |
| 95-th percentile | 1.702769257e+19 |
| Maximum | 1.844094317e+19 |
| Range | 1.844046073e+19 |
| Interquartile range (IQR) | 8.871662586e+18 |
Descriptive statistics
| Standard deviation | 4.98670453e+18 |
|---|---|
| Coefficient of variation (CV) | 0.5172205462 |
| Kurtosis | -1.115120878 |
| Mean | 9.641350419e+18 |
| Median Absolute Deviation (MAD) | 3.990921506e+18 |
| Skewness | -0.2372171121 |
| Sum | 3.851526665e+23 |
| Variance | 2.486722207e+37 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 1.434039016e+19 | 3891 | 9.7% | |
| 1.2057879e+19 | 3381 | 8.5% | |
| 7.903914528e+18 | 1107 | 2.8% | |
| 4.298118681e+18 | 421 | 1.1% | |
| 1.453186765e+19 | 395 | 1.0% | |
| 1.375625754e+19 | 394 | 1.0% | |
| 5.851252814e+18 | 365 | 0.9% | |
| 1.475657876e+19 | 295 | 0.7% | |
| 1.514548016e+19 | 295 | 0.7% | |
| 2.69285962e+18 | 287 | 0.7% | |
| Other values (6931) | 29117 | 72.9% |
| Value | Count | Frequency (%) | |
| 4.824369106e+14 | 1 | < 0.1% | |
| 1.234866104e+15 | 2 | < 0.1% | |
| 2.068449938e+15 | 3 | < 0.1% | |
| 1.711420047e+16 | 1 | < 0.1% | |
| 2.416657436e+16 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 1.844094317e+19 | 3 | < 0.1% | |
| 1.843882671e+19 | 7 | < 0.1% | |
| 1.843731353e+19 | 2 | < 0.1% | |
| 1.843728766e+19 | 2 | < 0.1% | |
| 1.843108375e+19 | 1 | < 0.1% |
ad_id
Real number (ℝ≥0)
| Distinct count | 19228 |
|---|---|
| Unique (%) | 48.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 16016715.903249225 |
|---|---|
| Minimum | 1000515 |
| Maximum | 22227340 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 312.1 KiB |
Quantile statistics
| Minimum | 1000515 |
|---|---|
| 5-th percentile | 3066490 |
| Q1 | 9027238 |
| median | 20303729.5 |
| Q3 | 21163923 |
| 95-th percentile | 21872897.7 |
| Maximum | 22227340 |
| Range | 21226825 |
| Interquartile range (IQR) | 12136685 |
Descriptive statistics
| Standard deviation | 7222259.539 |
|---|---|
| Coefficient of variation (CV) | 0.4509201251 |
| Kurtosis | -1.022447811 |
| Mean | 16016715.9 |
| Median Absolute Deviation (MAD) | 1021970.5 |
| Skewness | -0.8821406758 |
| Sum | 6.398357669e+11 |
| Variance | 5.216103284e+13 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 9027213 | 355 | 0.9% | |
| 20192676 | 221 | 0.6% | |
| 21522776 | 216 | 0.5% | |
| 20908196 | 207 | 0.5% | |
| 3048011 | 153 | 0.4% | |
| 20644045 | 152 | 0.4% | |
| 21163923 | 140 | 0.4% | |
| 3065545 | 127 | 0.3% | |
| 20017078 | 117 | 0.3% | |
| 20030165 | 110 | 0.3% | |
| Other values (19218) | 38150 | 95.5% |
| Value | Count | Frequency (%) | |
| 1000515 | 2 | < 0.1% | |
| 1000699 | 2 | < 0.1% | |
| 1000806 | 2 | < 0.1% | |
| 1000829 | 1 | < 0.1% | |
| 1000830 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 22227340 | 2 | < 0.1% | |
| 22227123 | 1 | < 0.1% | |
| 22227066 | 1 | < 0.1% | |
| 22226792 | 1 | < 0.1% | |
| 22226685 | 1 | < 0.1% |
advertiser_id
Real number (ℝ≥0)
| Distinct count | 6064 |
|---|---|
| Unique (%) | 15.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 22454.496545509162 |
|---|---|
| Minimum | 82 |
| Maximum | 39074 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 312.1 KiB |
Quantile statistics
| Minimum | 82 |
|---|---|
| 5-th percentile | 1268 |
| Q1 | 13476.5 |
| median | 23808 |
| Q3 | 32124 |
| 95-th percentile | 37422 |
| Maximum | 39074 |
| Range | 38992 |
| Interquartile range (IQR) | 18647.5 |
Descriptive statistics
| Standard deviation | 11796.0858 |
|---|---|
| Coefficient of variation (CV) | 0.5253329004 |
| Kurtosis | -0.8357317704 |
| Mean | 22454.49655 |
| Median Absolute Deviation (MAD) | 9024 |
| Skewness | -0.6158151524 |
| Sum | 897012228 |
| Variance | 139147640.2 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 27961 | 3363 | 8.4% | |
| 23808 | 2420 | 6.1% | |
| 23777 | 1592 | 4.0% | |
| 1325 | 790 | 2.0% | |
| 23778 | 481 | 1.2% | |
| 1268 | 448 | 1.1% | |
| 385 | 421 | 1.1% | |
| 23807 | 395 | 1.0% | |
| 28698 | 365 | 0.9% | |
| 24354 | 356 | 0.9% | |
| Other values (6054) | 29317 | 73.4% |
| Value | Count | Frequency (%) | |
| 82 | 29 | 0.1% | |
| 85 | 1 | < 0.1% | |
| 87 | 1 | < 0.1% | |
| 88 | 3 | < 0.1% | |
| 94 | 2 | < 0.1% |
| Value | Count | Frequency (%) | |
| 39074 | 2 | < 0.1% | |
| 38970 | 1 | < 0.1% | |
| 38961 | 1 | < 0.1% | |
| 38956 | 1 | < 0.1% | |
| 38942 | 1 | < 0.1% |
depth
Categorical
| Distinct count | 3 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 312.1 KiB |
| 2 | |
|---|---|
| 1 | |
| 3 |
| Value | Count | Frequency (%) | |
| 2 | 19439 | 48.7% | |
| 1 | 11053 | 27.7% | |
| 3 | 9456 | 23.7% |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
position
Categorical
| Distinct count | 3 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 312.1 KiB |
| 1 | |
|---|---|
| 2 | |
| 3 | 2999 |
| Value | Count | Frequency (%) | |
| 1 | 24417 | 61.1% | |
| 2 | 12532 | 31.4% | |
| 3 | 2999 | 7.5% |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
| Distinct count | 30748 |
|---|---|
| Unique (%) | 77.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3142145.502903775 |
|---|---|
| Minimum | 0 |
| Maximum | 26240100 |
| Zeros | 571 |
| Zeros (%) | 1.4% |
| Memory size | 312.1 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 8 |
| Q1 | 2364.25 |
| median | 112836.5 |
| Q3 | 3147909.25 |
| 95-th percentile | 17386650.7 |
| Maximum | 26240100 |
| Range | 26240100 |
| Interquartile range (IQR) | 3145545 |
Descriptive statistics
| Standard deviation | 5841539.586 |
|---|---|
| Coefficient of variation (CV) | 1.859092642 |
| Kurtosis | 3.859580544 |
| Mean | 3142145.503 |
| Median Absolute Deviation (MAD) | 112830.5 |
| Skewness | 2.151205504 |
| Sum | 1.255224286e+11 |
| Variance | 3.412358473e+13 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0 | 571 | 1.4% | |
| 1 | 414 | 1.0% | |
| 2 | 217 | 0.5% | |
| 4 | 176 | 0.4% | |
| 8 | 169 | 0.4% | |
| 5 | 164 | 0.4% | |
| 3 | 154 | 0.4% | |
| 6 | 140 | 0.4% | |
| 7 | 120 | 0.3% | |
| 15 | 83 | 0.2% | |
| Other values (30738) | 37740 | 94.5% |
| Value | Count | Frequency (%) | |
| 0 | 571 | 1.4% | |
| 1 | 414 | 1.0% | |
| 2 | 217 | 0.5% | |
| 3 | 154 | 0.4% | |
| 4 | 176 | 0.4% |
| Value | Count | Frequency (%) | |
| 26240100 | 1 | < 0.1% | |
| 26231458 | 1 | < 0.1% | |
| 26230009 | 1 | < 0.1% | |
| 26229427 | 1 | < 0.1% | |
| 26229194 | 1 | < 0.1% |
| Distinct count | 19803 |
|---|---|
| Unique (%) | 49.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 35194.431360769 |
|---|---|
| Minimum | 0 |
| Maximum | 1243163 |
| Zeros | 570 |
| Zeros (%) | 1.4% |
| Memory size | 312.1 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 9 |
| Q1 | 370 |
| median | 3389 |
| Q3 | 21030 |
| 95-th percentile | 178425.05 |
| Maximum | 1243163 |
| Range | 1243163 |
| Interquartile range (IQR) | 20660 |
Descriptive statistics
| Standard deviation | 100914.8155 |
|---|---|
| Coefficient of variation (CV) | 2.867351784 |
| Kurtosis | 44.26949755 |
| Mean | 35194.43136 |
| Median Absolute Deviation (MAD) | 3355 |
| Skewness | 5.862208181 |
| Sum | 1405947144 |
| Variance | 1.01838e+10 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0 | 570 | 1.4% | |
| 1 | 383 | 1.0% | |
| 2 | 182 | 0.5% | |
| 8 | 164 | 0.4% | |
| 3 | 159 | 0.4% | |
| 6 | 152 | 0.4% | |
| 4 | 150 | 0.4% | |
| 10 | 140 | 0.4% | |
| 5 | 125 | 0.3% | |
| 9 | 116 | 0.3% | |
| Other values (19793) | 37807 | 94.6% |
| Value | Count | Frequency (%) | |
| 0 | 570 | 1.4% | |
| 1 | 383 | 1.0% | |
| 2 | 182 | 0.5% | |
| 3 | 159 | 0.4% | |
| 4 | 150 | 0.4% |
| Value | Count | Frequency (%) | |
| 1243163 | 1 | < 0.1% | |
| 1242910 | 1 | < 0.1% | |
| 1240995 | 1 | < 0.1% | |
| 1234956 | 1 | < 0.1% | |
| 1231769 | 1 | < 0.1% |
title_id
Real number (ℝ≥0)
| Distinct count | 25321 |
|---|---|
| Unique (%) | 63.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 173282.90677881247 |
|---|---|
| Minimum | 0 |
| Maximum | 4050208 |
| Zeros | 355 |
| Zeros (%) | 0.9% |
| Memory size | 312.1 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 15 |
| Q1 | 670.75 |
| median | 10654 |
| Q3 | 100289.5 |
| 95-th percentile | 959922.65 |
| Maximum | 4050208 |
| Range | 4050208 |
| Interquartile range (IQR) | 99618.75 |
Descriptive statistics
| Standard deviation | 465674.7878 |
|---|---|
| Coefficient of variation (CV) | 2.68736713 |
| Kurtosis | 24.91604281 |
| Mean | 173282.9068 |
| Median Absolute Deviation (MAD) | 10626 |
| Skewness | 4.604918963 |
| Sum | 6922305560 |
| Variance | 2.16853008e+11 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0 | 355 | 0.9% | |
| 4 | 168 | 0.4% | |
| 2 | 167 | 0.4% | |
| 1 | 152 | 0.4% | |
| 3 | 135 | 0.3% | |
| 5 | 126 | 0.3% | |
| 8 | 114 | 0.3% | |
| 7 | 114 | 0.3% | |
| 9 | 113 | 0.3% | |
| 6 | 111 | 0.3% | |
| Other values (25311) | 38393 | 96.1% |
| Value | Count | Frequency (%) | |
| 0 | 355 | 0.9% | |
| 1 | 152 | 0.4% | |
| 2 | 167 | 0.4% | |
| 3 | 135 | 0.3% | |
| 4 | 168 | 0.4% |
| Value | Count | Frequency (%) | |
| 4050208 | 1 | < 0.1% | |
| 4039102 | 1 | < 0.1% | |
| 4028916 | 1 | < 0.1% | |
| 4028814 | 1 | < 0.1% | |
| 4027734 | 1 | < 0.1% |
description_id
Real number (ℝ≥0)
| Distinct count | 22381 |
|---|---|
| Unique (%) | 56.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 111150.89986983078 |
|---|---|
| Minimum | 0 |
| Maximum | 3171504 |
| Zeros | 355 |
| Zeros (%) | 0.9% |
| Memory size | 312.1 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 12 |
| Q1 | 356 |
| median | 5048 |
| Q3 | 52861.75 |
| 95-th percentile | 597557.2 |
| Maximum | 3171504 |
| Range | 3171504 |
| Interquartile range (IQR) | 52505.75 |
Descriptive statistics
| Standard deviation | 328374.223 |
|---|---|
| Coefficient of variation (CV) | 2.954310072 |
| Kurtosis | 32.13248565 |
| Mean | 111150.8999 |
| Median Absolute Deviation (MAD) | 5025 |
| Skewness | 5.185337063 |
| Sum | 4440256148 |
| Variance | 1.078296304e+11 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0 | 355 | 0.9% | |
| 1 | 190 | 0.5% | |
| 5 | 167 | 0.4% | |
| 2 | 159 | 0.4% | |
| 4 | 154 | 0.4% | |
| 3 | 152 | 0.4% | |
| 6 | 146 | 0.4% | |
| 9 | 143 | 0.4% | |
| 7 | 135 | 0.3% | |
| 8 | 129 | 0.3% | |
| Other values (22371) | 38218 | 95.7% |
| Value | Count | Frequency (%) | |
| 0 | 355 | 0.9% | |
| 1 | 190 | 0.5% | |
| 2 | 159 | 0.4% | |
| 3 | 152 | 0.4% | |
| 4 | 154 | 0.4% |
| Value | Count | Frequency (%) | |
| 3171504 | 1 | < 0.1% | |
| 3169356 | 1 | < 0.1% | |
| 3162659 | 1 | < 0.1% | |
| 3153679 | 1 | < 0.1% | |
| 3152468 | 1 | < 0.1% |
| Distinct count | 30114 |
|---|---|
| Unique (%) | 75.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3669622.261865425 |
|---|---|
| Minimum | 0 |
| Maximum | 23907337 |
| Zeros | 9633 |
| Zeros (%) | 24.1% |
| Memory size | 312.1 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 1472.25 |
| median | 888386.5 |
| Q3 | 5129631.25 |
| 95-th percentile | 16392696.65 |
| Maximum | 23907337 |
| Range | 23907337 |
| Interquartile range (IQR) | 5128159 |
Descriptive statistics
| Standard deviation | 5492058.472 |
|---|---|
| Coefficient of variation (CV) | 1.49662774 |
| Kurtosis | 2.398862313 |
| Mean | 3669622.262 |
| Median Absolute Deviation (MAD) | 888386.5 |
| Skewness | 1.777911493 |
| Sum | 1.465940701e+11 |
| Variance | 3.016270626e+13 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0 | 9633 | 24.1% | |
| 2 | 6 | < 0.1% | |
| 187 | 4 | < 0.1% | |
| 124 | 3 | < 0.1% | |
| 229 | 3 | < 0.1% | |
| 61 | 3 | < 0.1% | |
| 125 | 3 | < 0.1% | |
| 154 | 3 | < 0.1% | |
| 52 | 3 | < 0.1% | |
| 56 | 3 | < 0.1% | |
| Other values (30104) | 30284 | 75.8% |
| Value | Count | Frequency (%) | |
| 0 | 9633 | 24.1% | |
| 1 | 1 | < 0.1% | |
| 2 | 6 | < 0.1% | |
| 4 | 1 | < 0.1% | |
| 5 | 2 | < 0.1% |
| Value | Count | Frequency (%) | |
| 23907337 | 1 | < 0.1% | |
| 23902732 | 1 | < 0.1% | |
| 23894682 | 1 | < 0.1% | |
| 23888301 | 1 | < 0.1% | |
| 23884533 | 1 | < 0.1% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.First rows
| click | impression | url_hash | ad_id | advertiser_id | depth | position | query_id | keyword_id | title_id | description_id | user_id | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 1 | 1.071003e+19 | 8343295 | 11700 | 3 | 3 | 7702266 | 21264 | 27892 | 1559 | 0 |
| 1 | 1 | 1 | 1.736385e+19 | 20017077 | 23798 | 1 | 1 | 93079 | 35498 | 4 | 36476 | 562934 |
| 2 | 0 | 1 | 8.915473e+18 | 21348354 | 36654 | 1 | 1 | 10981 | 19975 | 36105 | 33292 | 11621116 |
| 3 | 0 | 1 | 4.426693e+18 | 20366086 | 33280 | 3 | 3 | 0 | 5942 | 4057 | 4390 | 8778348 |
| 4 | 0 | 1 | 1.157260e+19 | 6803526 | 10790 | 2 | 1 | 9881978 | 60593 | 25242 | 1679 | 12118311 |
| 5 | 1 | 1 | 2.827577e+17 | 21186478 | 35793 | 2 | 1 | 163315 | 4871 | 3257 | 1153 | 2886008 |
| 6 | 0 | 1 | 8.813903e+18 | 20886690 | 34840 | 2 | 2 | 316 | 543 | 2206 | 2888 | 7589739 |
| 7 | 0 | 1 | 3.811035e+18 | 21367376 | 20667 | 3 | 2 | 2601439 | 118 | 9594 | 9705 | 579253 |
| 8 | 0 | 1 | 9.806838e+18 | 21811752 | 37737 | 3 | 2 | 1631 | 333 | 841 | 2175 | 5277279 |
| 9 | 1 | 1 | 1.434039e+19 | 9027213 | 23808 | 2 | 1 | 5 | 1 | 0 | 0 | 11808635 |
Last rows
| click | impression | url_hash | ad_id | advertiser_id | depth | position | query_id | keyword_id | title_id | description_id | user_id | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 39938 | 0 | 2 | 1.434039e+19 | 21163921 | 23808 | 1 | 1 | 20912536 | 51 | 337 | 38 | 40187 |
| 39939 | 0 | 1 | 1.544609e+19 | 21098737 | 7073 | 1 | 1 | 105129 | 10273 | 76575 | 193603 | 22657879 |
| 39940 | 0 | 1 | 1.637837e+19 | 21229183 | 35860 | 2 | 1 | 48636 | 26467 | 35848 | 61665 | 1485104 |
| 39941 | 0 | 1 | 1.146304e+19 | 21250008 | 35364 | 2 | 2 | 879683 | 647353 | 3956650 | 2788319 | 8276 |
| 39942 | 0 | 1 | 1.768833e+19 | 20382992 | 17403 | 2 | 2 | 11881210 | 92 | 1314 | 1673 | 72733 |
| 39943 | 0 | 1 | 3.593550e+18 | 21898643 | 37867 | 2 | 1 | 12825939 | 1091 | 1657 | 1914 | 0 |
| 39944 | 0 | 1 | 1.760828e+19 | 20575578 | 8873 | 2 | 1 | 11699 | 8338 | 7866 | 9210 | 19487 |
| 39945 | 0 | 5 | 9.613260e+18 | 21183848 | 18716 | 2 | 1 | 243826 | 9594 | 8881 | 13277 | 2305 |
| 39946 | 0 | 1 | 9.750423e+18 | 21222438 | 35880 | 3 | 3 | 7130804 | 13078 | 943122 | 1436 | 0 |
| 39947 | 0 | 1 | 1.205788e+19 | 20180245 | 27961 | 1 | 1 | 21659 | 39732 | 88617 | 88724 | 5602668 |
Most frequent
| click | impression | url_hash | ad_id | advertiser_id | depth | position | query_id | keyword_id | title_id | description_id | user_id | count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 1 | 1.251029e+18 | 20346589 | 16142 | 3 | 2 | 117400 | 86846 | 538027 | 69985 | 7431 | 2 |
| 1 | 0 | 1 | 2.670953e+18 | 20172890 | 23805 | 3 | 2 | 19 | 47 | 14995 | 16 | 6652930 | 2 |
| 2 | 0 | 1 | 2.994925e+18 | 20061287 | 23803 | 1 | 1 | 1465001 | 13235 | 106 | 73 | 21960715 | 2 |
| 3 | 0 | 1 | 4.298119e+18 | 21866026 | 385 | 1 | 1 | 305307 | 14735 | 24865 | 23556 | 3181 | 2 |
| 4 | 0 | 1 | 5.824342e+18 | 4385495 | 21959 | 3 | 3 | 5772745 | 32451 | 232499 | 218397 | 0 | 2 |
| 5 | 0 | 1 | 7.903915e+18 | 21162356 | 1325 | 1 | 1 | 4955732 | 365 | 578 | 953 | 0 | 2 |
| 6 | 0 | 1 | 7.903915e+18 | 21372653 | 2010 | 1 | 1 | 20766562 | 75 | 509 | 843 | 0 | 2 |
| 7 | 0 | 1 | 9.317847e+18 | 21299050 | 33746 | 2 | 1 | 2427067 | 188434 | 934170 | 2556 | 0 | 2 |
| 8 | 0 | 1 | 1.205788e+19 | 20157628 | 27961 | 3 | 3 | 0 | 0 | 49 | 60 | 9802064 | 2 |
| 9 | 0 | 1 | 1.341274e+19 | 21449769 | 27712 | 3 | 1 | 275984 | 25450 | 74953 | 10756 | 1647403 | 2 |